在肺结节的管理中,我们希望根据其在计算机断层扫描(CT)扫描的直径变化方面预测结节的演变,然后根据结节不断增长的趋势的预测结果提供后续建议。为了提高肺结节增长趋势预测的性能,与连续CT扫描中相同结节的变化进行比较至关重要。在此激励的情况下,我们从国家肺筛查试验(NLST)数据集进行了两次以上的CT扫描,筛选了4,666名受试者,以组织一个名为NLSTT的颞数据集。在具体上,我们首先检测并配对感兴趣的区域(ROI),该区域涵盖了基于注册的CT扫描的相同结节。之后,我们通过模型预测结节的纹理类别和直径大小。最后,我们根据直径的变化来注释每个结节的演化类别。基于构建的NLSTT数据集,我们建议一个暹罗编码器同时利用从连续的CT扫描中检测到的3D ROI的判别特征。然后,我们在新小时设计一个时空混合器(STM)来利用连续3D ROI中同一结节的间隔变化,并捕获结节区域的空间依赖性和当前的3D ROI。根据临床诊断常规,我们采用层次损失来更多地关注生长的结节。我们有组织的数据集上的广泛实验证明了我们提出的方法的优势。我们还对内部数据集进行了实验,以通过将其与熟练的临床医生进行比较来评估我们方法的临床实用性。
translated by 谷歌翻译
许多实际关系系统,如社交网络和生物系统,包含动态相互作用。在学习动态图形表示时,必须采用连续的时间信息和几何结构。主流工作通过消息传递网络(例如,GCN,GAT)实现拓扑嵌入。另一方面,时间演进通常通过在栅极机构中具有方便信息过滤的存储单元(例如,LSTM或GU)来表达。但是,由于过度复杂的编码,这种设计可以防止大规模的输入序列。这项工作从自我关注的哲学中学习,并提出了一种高效的基于频谱的神经单元,采用信息的远程时间交互。发达的频谱窗口单元(SWINIT)模型预测了具有保证效率的可扩展动态图形。该架构与一些构成随机SVD,MLP和图形帧卷积的一些简单的有效计算块组装。 SVD加MLP模块编码动态图事件的长期特征演进。帧卷积中的快速帧图形变换嵌入了结构动态。两种策略都提高了模型对可扩展分析的能力。特别地,迭代的SVD近似度将注意力的计算复杂性缩小到具有n个边缘和D边缘特征的动态图形的关注的计算复杂性,并且帧卷积的多尺度变换允许在网络训练中具有足够的可扩展性。我们的Swinit在各种在线连续时间动态图表学习任务中实现了最先进的性能,而与基线方法相比,可学习参数的数量可达七倍。
translated by 谷歌翻译
顺序推荐旨在为特定时间戳在特定时间戳提供历史行为中为用户选择最合适的项目。现有方法通常根据像马尔可夫链等转换的方法模拟用户行为序列。然而,这些方法也隐含地假设用户在不考虑用户之间的影响而彼此独立。实际上,这种影响在序列推荐中发挥着重要作用,因为用户的行为容易受其他人的影响。因此,期望聚合用户行为和用户之间的影响,这些用户在时间上演化并涉及用户和项目的异构图。在本文中,我们纳入了动态用户项异构图,提出了一种新的顺序推荐框架。结果,可以考虑历史行为以及用户之间的影响。为此,我们首先将顺序建议形式正式确定估计时间动态异构图和用户行为序列的条件概率的问题。之后,我们利用条件随机字段来聚合异构图形和用户行为以进行概率估计,并采用伪似然方法来得出易行目标函数。最后,我们提供所提出的框架的可扩展和灵活的实现。三个现实世界数据集的实验结果不仅展示了我们所提出的方法的有效性,而且还提供了一些关于顺序推荐的有洞察力的发现。
translated by 谷歌翻译
对抗性扰动对于证明深度学习模型的鲁棒性至关重要。通用的对抗扰动(UAP)可以同时攻击多个图像,因此提供了更统一的威胁模型,从而避免了图像攻击算法。但是,当从不同的图像源绘制图像时(例如,具有不同的图像分辨率)时,现有的UAP生成器不发达。在图像来源的真实普遍性方面,我们将UAP生成的新颖看法是一个定制的几个实例,它利用双杆优化和学习优化的(L2O)技术(L2O)技术,以提高攻击成功率(ASR)(ASR) )。我们首先考虑流行模型不可知的元学习(MAML)框架,以将UAP生成器元素进行。但是,我们看到MAML框架并未直接提供跨图像源的通用攻击,从而要求我们将其与L2O的另一个元学习框架集成在一起。元学习UAP发电机(i)的最终方案的性能(ASR高50%)比预计梯度下降等基线的方案(II)比香草L2O和MAML框架的性能更好(37%)(当适用),(iii)能够同时处理不同受害者模型和图像数据源的UAP生成。
translated by 谷歌翻译
In this paper, we propose a robust 3D detector, named Cross Modal Transformer (CMT), for end-to-end 3D multi-modal detection. Without explicit view transformation, CMT takes the image and point clouds tokens as inputs and directly outputs accurate 3D bounding boxes. The spatial alignment of multi-modal tokens is performed implicitly, by encoding the 3D points into multi-modal features. The core design of CMT is quite simple while its performance is impressive. CMT obtains 73.0% NDS on nuScenes benchmark. Moreover, CMT has a strong robustness even if the LiDAR is missing. Code will be released at https://github.com/junjie18/CMT.
translated by 谷歌翻译
Knowledge graphs (KG) have served as the key component of various natural language processing applications. Commonsense knowledge graphs (CKG) are a special type of KG, where entities and relations are composed of free-form text. However, previous works in KG completion and CKG completion suffer from long-tail relations and newly-added relations which do not have many know triples for training. In light of this, few-shot KG completion (FKGC), which requires the strengths of graph representation learning and few-shot learning, has been proposed to challenge the problem of limited annotated data. In this paper, we comprehensively survey previous attempts on such tasks in the form of a series of methods and applications. Specifically, we first introduce FKGC challenges, commonly used KGs, and CKGs. Then we systematically categorize and summarize existing works in terms of the type of KGs and the methods. Finally, we present applications of FKGC models on prediction tasks in different areas and share our thoughts on future research directions of FKGC.
translated by 谷歌翻译
Few Shot Instance Segmentation (FSIS) requires models to detect and segment novel classes with limited several support examples. In this work, we explore a simple yet unified solution for FSIS as well as its incremental variants, and introduce a new framework named Reference Twice (RefT) to fully explore the relationship between support/query features based on a Transformer-like framework. Our key insights are two folds: Firstly, with the aid of support masks, we can generate dynamic class centers more appropriately to re-weight query features. Secondly, we find that support object queries have already encoded key factors after base training. In this way, the query features can be enhanced twice from two aspects, i.e., feature-level and instance-level. In particular, we firstly design a mask-based dynamic weighting module to enhance support features and then propose to link object queries for better calibration via cross-attention. After the above steps, the novel classes can be improved significantly over our strong baseline. Additionally, our new framework can be easily extended to incremental FSIS with minor modification. When benchmarking results on the COCO dataset for FSIS, gFSIS, and iFSIS settings, our method achieves a competitive performance compared to existing approaches across different shots, e.g., we boost nAP by noticeable +8.2/+9.4 over the current state-of-the-art FSIS method for 10/30-shot. We further demonstrate the superiority of our approach on Few Shot Object Detection. Code and model will be available.
translated by 谷歌翻译
Graph Neural Networks (GNNs) have shown satisfying performance on various graph learning tasks. To achieve better fitting capability, most GNNs are with a large number of parameters, which makes these GNNs computationally expensive. Therefore, it is difficult to deploy them onto edge devices with scarce computational resources, e.g., mobile phones and wearable smart devices. Knowledge Distillation (KD) is a common solution to compress GNNs, where a light-weighted model (i.e., the student model) is encouraged to mimic the behavior of a computationally expensive GNN (i.e., the teacher GNN model). Nevertheless, most existing GNN-based KD methods lack fairness consideration. As a consequence, the student model usually inherits and even exaggerates the bias from the teacher GNN. To handle such a problem, we take initial steps towards fair knowledge distillation for GNNs. Specifically, we first formulate a novel problem of fair knowledge distillation for GNN-based teacher-student frameworks. Then we propose a principled framework named RELIANT to mitigate the bias exhibited by the student model. Notably, the design of RELIANT is decoupled from any specific teacher and student model structures, and thus can be easily adapted to various GNN-based KD frameworks. We perform extensive experiments on multiple real-world datasets, which corroborates that RELIANT achieves less biased GNN knowledge distillation while maintaining high prediction utility.
translated by 谷歌翻译
This paper focuses on designing efficient models with low parameters and FLOPs for dense predictions. Even though CNN-based lightweight methods have achieved stunning results after years of research, trading-off model accuracy and constrained resources still need further improvements. This work rethinks the essential unity of efficient Inverted Residual Block in MobileNetv2 and effective Transformer in ViT, inductively abstracting a general concept of Meta-Mobile Block, and we argue that the specific instantiation is very important to model performance though sharing the same framework. Motivated by this phenomenon, we deduce a simple yet efficient modern \textbf{I}nverted \textbf{R}esidual \textbf{M}obile \textbf{B}lock (iRMB) for mobile applications, which absorbs CNN-like efficiency to model short-distance dependency and Transformer-like dynamic modeling capability to learn long-distance interactions. Furthermore, we design a ResNet-like 4-phase \textbf{E}fficient \textbf{MO}del (EMO) based only on a series of iRMBs for dense applications. Massive experiments on ImageNet-1K, COCO2017, and ADE20K benchmarks demonstrate the superiority of our EMO over state-of-the-art methods, \eg, our EMO-1M/2M/5M achieve 71.5, 75.1, and 78.4 Top-1 that surpass \textbf{SoTA} CNN-/Transformer-based models, while trading-off the model accuracy and efficiency well.
translated by 谷歌翻译
The development of social media user stance detection and bot detection methods rely heavily on large-scale and high-quality benchmarks. However, in addition to low annotation quality, existing benchmarks generally have incomplete user relationships, suppressing graph-based account detection research. To address these issues, we propose a Multi-Relational Graph-Based Twitter Account Detection Benchmark (MGTAB), the first standardized graph-based benchmark for account detection. To our knowledge, MGTAB was built based on the largest original data in the field, with over 1.55 million users and 130 million tweets. MGTAB contains 10,199 expert-annotated users and 7 types of relationships, ensuring high-quality annotation and diversified relations. In MGTAB, we extracted the 20 user property features with the greatest information gain and user tweet features as the user features. In addition, we performed a thorough evaluation of MGTAB and other public datasets. Our experiments found that graph-based approaches are generally more effective than feature-based approaches and perform better when introducing multiple relations. By analyzing experiment results, we identify effective approaches for account detection and provide potential future research directions in this field. Our benchmark and standardized evaluation procedures are freely available at: https://github.com/GraphDetec/MGTAB.
translated by 谷歌翻译